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The free energy profile of a reaction can be estimated in a molecular dynamic (MD) approach by 
imposing a mechanical constraint along a reaction constraint (RC). Many recent studies have shown 
that the temperature can greatly influence the path followed by the reactants. Here, we propose a 
practical way to construct the minimum energy path directly on the free energy surface (FES) at a 
given temperature. First, we follow the blue-moon ensemble method to derive the expression of the 
free energy gradient for a given RC. These derivatives are then used to find the actual minimum 
energy reaction path at finite temperature, in a way similar to the Intrinsic Reaction Path of Fukui 
on the potential energy surface [J. Phys. Chem. 74, 4161 (1970)]. Once the path is know, one 
can calculate the free energy profile using thermodynamic integration. We also show that the mass- 
metric correction cancels for many types of constraints, making the procedure easy to use. Finally, 



the minimum free energy path at 300K for the reaction CC1 2 + H 2 C — CH 2 



is compared 



with a path based on a simple ID reaction coordinate. A comparison is also given with the reaction 
path at OK. 

I. INTRODUCTION 

Being able to understand, or better to predict, the evolution of a complex system is of critical importance in all 
areas of chemistry and biology. In turn, this understanding requires the knowledge of not only the mechanism at a 
microscopic level but also of the free energy change associated with the reaction under investigation. In principle, 
molecular dynamic simulation can give access to the free energy profile of chemical processes, and indeed, free energy 
simulations have become a key tool in the study of many chemical and biochemical problems. 1 

However, chemical reactions are usually rare events and it would require a much too long simulation time for a 
process to occur without any bias. Thus, finding an efficient way to accurately compute the free energy difference for 
a given reaction is still a very active field of research. 2-6 Several methods have been proposed to evaluate the change 
of the free energy along a given path: free energy perturbation, 7 umbrella sampling, 8 thermodynamic integration 9 
and, more recently, the Jarzynski equality. 5,6 (Sec also rcf. 10 for a review.) 

Of particular importance is the understanding of the link between constrained and unconstrained simulation, which 
was put on firm basis a decade ago by Carter et al. who introduced the Blue Moon relation. 11 During the past few 
years, this relations has been refined so that many formula are now at hand to evaluate the derivative of the free 
energy along a given reaction coordinate. 12-14 

The access to analytical gradients on the potential energy surface was a very important step forward in standard 
quantum chemistry: it became possible to find the optimum geometry of complex systems, optimizing transition 
states became easier, and frequencies could be obtained much faster with a higher accuracy. On the other hand, 
despite the fact that the exact equations for the evaluation of gradients of the free energy surface have been available 
for many years, their use has mainly been restricted to the evaluation of the free energy change along a predefined 
path. Evaluating such a free energy profile is common in both chemistry and biology. However, it seems that the 
potential of free energy gradients has not been fully appreciated. 

In this study, we propose to use the available equations for the gradient to explore the free energy surface in much 
the same way as gradients have been used to explore the potential energy surface. Special attention will be given to 
finding the minimum free energy path. 

The account of this investigation is organized as follows: In the next section (Sec. II) we first review the main 
equations employed to evaluate the derivative of the free energy along a reaction coordinate in a uniform way. In 
Section III, the scope of these expressions is extended, and their actual use in a simulation is discussed. In Section 
IV, they are then used to find a minimum energy path on the free energy surface for the addition of dihalide carbene 
to ethylene. Section V offers the concluding remarks. 
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II. THEORY 



A. Intrinsic Reaction Path 



We consider a chemical system composed of N atoms of mass rrii described by 3N cartesian coordinates Xi with 
i = 1, . . . , 3-/V. We want to construct the minimum energy path connecting the reactants to the products on the free 
energy surface (FES). On the potential energy surface (PES), K. Fukui 15 ' 16 has defined such a path: the Intrinsic 
Reaction Path (IRP). On each point of this path, the atomic cartesian coordinates satisfy 



rijdxi mjdxj 

— — - = — - — - = (1) 

dV dV ■ ■ ■ V L ) 

dxi dxj 



where V is the potential energy and m, is the mass of the atom with coordinate Xi 
This equation can be simplified by using mass-weighted cartesian coordinates: 



x\ = JmiXi i = l,...,3N (2) 

In its simplified form, equation (1) reads 

_dx[^dx^ 

dV dV ■ ■ ■ 

Thus, the IRP corresponds to a steepest descent path using mass-weighted coordinates. 

In this work, we want to construct a similar path on the FES. Therefore, we have to find a path satisfying 



dx\ _ dx' 3 

~b~A ~ ~~b~A 

dx'- dx'- 



(4) 



where A stands for the Helmholtz free energy. 

In order to find this path, we have to calculate the gradient of the free energy for each point of this path. In 
practice, this path will be discretized by a set of k points in the configurational space, i.e., by the set of k molecular 
geometries: 

x' j = {x'i;i = l,...,3N},j = l,...,k (5) 



B. Generalized coordinates 



Even though one can use the 3N cartesian coordinates to describe the system and its evolution, it is usually easier 
to employ a set of 3N-6 generalized internal coordinates, as well as the overall rotation and translation of the molecule. 

Further, chemical intuition tells us that most of the time only a few degrees of freedom (reaction coordinates) 
are sufficient to describe the reaction path. Thus, it appears natural to split the generalized coordinates into two 
sub-sets corresponding to the active coordinates, denoted by £ = (£i, . . . ,£ r ) and the inactive coordinates, denoted 
by q = (<?i) • • • j Qn)- A more quantitative criterion for this separation will be discussed later. These two sets are 
associated with two groups of generalized momenta p q and p^, and a velocity vector: 




The IRP will then be constructed in the subset of active coordinates £. 

This separation is similar to the adiabatic separation used in quantum dynamics between the slow modes (corre- 
sponding to our active set) and the fast modes (corresponding to our inactive set). One common point is that the 
inactive coordinates can vary along the reaction path. In other words, being inactive does not mean being frozen: 
an inactive coordinate is characterized by the fact that it does not contribute to the direction of the minimum free 
energy path as the thermal motions along these coordinates are nearly harmonic. However, motion along the inactive 
coordinates might contribute to the changes in the free energy as the curvature of the harmonic potential changes 
along the free energy path. This point will be illustrated in the application section. 
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It is worth discussing here the meaning of a structure on the potential and on the free energy surfaces. On the 
potential energy surface, a structure corresponds to a stationary point: for such a point, the derivatives of the potential 
are zero for all coordinates. On the free energy surface, we can use a similar description: a structure corresponds to a 
point in which the derivatives of the free energy are zero for all coordinates. By definition of the inactive coordinate, 
they do not contribute to a change in the direction of the minimum free energy path. Therefore, the derivative of the 
free energy along an inactive coordinate is zero: ^jf = f° r a ^ inactive coordinates. As a consequence, a structure 
can be define as a point in which the derivatives of the free energy are zero for all active coordinates. To complete 
the geometrical description, we will use the thermal average of the inactive coordinates during the molecular dynamic 
simulation. 



C. Notations 



Many expressions have already been proposed to evaluate the derivative of the free energy in a predefine 
direction. 12 ~ 14 ' 17 ~ 19 In this section, we will recall the expressions that will be used throughout this work. For the 
sake of simplicity, the expression will be given in the case where the active set contains only one coordinate £. It is 
worth noting that once the reaction path has been constructed, only one coordinate is needed because the reaction 
coordinate can be uniquely define as the mass weighted curvilinear distance from the reactants to the point of interest 
along the path. 

We denote by J the Jacobian matrix for the transformation from cartesian coordinates to generalized coordinates. 
x-» (q,0= 

3=1 ~ 



dq d£ 

and by | J | its determinant (that is the Jacobian). We also define J': the Jacobian matrix for the transformation 
from mass weighted cartesian coordinates to generalized coordinates. 
We introduced the mass matrix A q ^ defined by: A q ^ = J'MJ 
where M is a 3 A x 3 A diagonal matrix containing all atomic masses. 
The generalized momenta are related to the velocity vector by: 

For further convenience, we decompose A g j and its inverse Aj^ 1 into blocks: 

A ?e - ^ B | c 6 ) A H - { Y* Z i ) (7) 

Some properties of these matrices are given in Appendix A. 
Last, let us write explicitly the form of and Y^: 

e m, 8x % d Xl f-f dx\ dx\ K ' 

2—1 l — l 1 L 

K V J ^ m, &x % dx % ^ dx[ dx\ y ' 

D. Gradient of the Free Energy 

The free energy A(£*) is related to the partition function Q(£*) by A(£*) = -k B T In {Q {£,*))■ Therefore, a prereq- 

dA dQ 
uisite to the determination of — — is the evaluation of — — . 

The partition functions is defined by: 

Q(D= f dqf dp q dp 6 exp(-pH) (10) 
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where H is the Hamiltonian associated with our system 

1 



H(q,£,p„f) = -p l ^Ajp qi 



V(q,0 



(11) 



However, in a molecular dynamic simulation, constraining the reaction coordinate £ to remain constant and equal 
to £* implies imposing the additional constraint £ = 0. Therefore, the ensemble average during the MD simulation is 
not the one needed in equation (10) because p^ is not sampled. When the reaction coordinate £ is constrained to a 
specific value, £ = £*, the Hamiltonian associated with the system becomes: 

(12) 



In the following, we will denote by 



H|,(q,p 9 ) = -P*A- I p 9 + F(q,r) 
Jdqj dpqOcxp (-pm,) 



(O) 



Jdqj dp q cxp f-^H| 

;d ensemble. The notat 
al to £ * . The average fi 

Jdqd^Jdp qi Oexp(-/31i) 



(13) 



the average of a function O over the constrained ensemble. The notation ()^. indicates that the sampling is done 
along p q and q while £ remains constant and equal to £*. The average for an unconstrained simulation is: 



/ dqd£_ J dp qi exp (-/3H) 
The first step in evaluating the derivative of the free energy is to relate these two averages. 



(14) 



1. Blue Moon Correction 

Following the work of Carter et al., we can rewrite the kinetic part of the unconstrained Hamiltonian as 



We can then rewrite the average value of an operator O independent of p^: 

1 



J dq J dp q £ O exp (— /3H) = / dq J dp q O exp 



-/3 



2 

dp£ exp 



1 



--/3 f P4 + Z^Y\p q ) Z< l P€ + Z^Y\p 



Using the definition of H|» and the properties of Gaussian integrals (see eq. (Bl) of Appendix B), it comes: 

J dq J dp qi O exp (-/3H) = / dqj dp q O exp (-/3H|„) Z £ 
Thus, the relation between the constrained and unconstrained simulations reads 



-1/2 



/ dq / dp q exp ( 




oz- 1/2 


J dq J dp q exp 




) ^ 



(0) = 



which corresponds to the standard Blue Moon correction. 11 
Using equation (17), the partition functions can now be written: 



which leads to: 



dQ(Q \ 



dq J dp q exp (— /3H|„) Z^ 
o>H; 



dq I dp q e~ 0n t*Z i 



1/2 



<9£ 2 € <9£ 



(15) 



(16) 



(17) 



(18) 



(19) 
(20) 



Starting from this equation, two different procedures have been proposed. In the first one, this equation is expanded 
using the properties of the mass matrix and of the Gaussian integrals. The second one follows more closely the 
philosophy of a MD simulation and tries to relate the derivative of the partition function to the force A £ acting on 
the reaction coordinate £. These two procedures will be detailed in the next sections. 
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2. Expanding the Hamiltonian 

From equation (12), we have: 



Plugging equations (B2) of Appendix B and (21) into equation (20), one finds 



i ^A" 1 dV 
Pl^7-Pi + — ( 21 ) 



( dA~ l \ dV 1 idZ £ \ 



Using Tr (A q a ^ = — 9 and equation (A7), one finally gets: 

&4(m i ^ i _1/2 (^- fcTi * 



In the case of multiple constraints, the derivative along the constraint & reads: 12 ' 17 



7 1-1/2 / .^ ajnjjj 



(23) 



(24) 



% V <|Z€l" V % 

Although these formulas are exact, they are not as useful as one might have expected because they require the 
knowledge of the full Jacobian matrix. This in turns implies that the full set of generalized coordinates is known, 
which is something usually not desirable for a big molecule. 



3. Relation with the Lagrange multiplier 

In cases where constructing the full set of generalized coordinates q is not desirable, one has to get rid of the 
explicit dependence of the previous equations on the Jacobian. This is what is done in the second approach: instead 
of relating the derivative of the constrained Hamiltonian H|» to the Jacobian, we will relate it to the force acting on 
the reaction coordinate during the simulation. 

Indeed, during a MD simulation, one makes use of the ergodicity principle, and the averages are actually perform 
over time and not directly over the phase space. Then, in order to ensure that the reaction coordinate £ is constant 
and equal to £*, a modified Lagrangian is used: 

C* = ^v qS *A ge v qS - V(d, - D (25) 



where C is the Lagrangian of the unconstrained system and v q £ is the velocity vector. A^ is the Lagrange multiplier 
associated with the reaction coordinate £. Its value is adjusted at each step of the simulation so that £ = £*. In 
practice, the SHAKE algorithm was used for our simulations. 20 In this algorithm, A^ is adjusted to ensure that £ = 0. 

For an unconstrained simulation, the equation of motion of the reaction coordinate is: 



d fd£\ _ dC 



(26) 



which gives: 



d£ dt v ^ dt 
Taking into account that £ = in a constrained simulation, one has 



<9£ dt 

%-£m+* (2<J) 
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Demanding that £ = in a constrained simulation, we have: 

8C _ d 



Using the definition of the Lagrangian, it comes: 



f 1 ,dA Q . dV d , t .,. I 
1 .9A" 1 dV d , nf 



* ( B l<0 (33) 



<9£ 

Inserting equation (33) in the expression for the derivative of the partition function gives 



(34) 



In order to obtain the final formula we need, we have to integrate analytically the term ^ (^f'jj • This will be 
done by following the work of den Otter et al. 13 First, we use: 

^'"T, m = i fa"*!*) + W (ti Z-"'Z-^ (35, 



dt v dt n *v 2 ^ € ~e eft 

Using the ergodicity principle, we have: 21 

As £ = 0, we have = q*^-, and thus: 

Using equations (B2) and (A3), we get: 

J dp qdq e-^hBy^§ = J d Pqdq e-^B\A-^ 



(37) 



Last, 



dxi ^-r 1 doj dxi d£ dxi 

3 



and then 



Collecting equations (34-38), we have shown 



/■ 
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Using this last equation, we recover the general formula derived by Sprik et al., 12 den Otter et al. 13 and Darve et 
!.: 14 

'i=3N 



d_A = 1 / -y a . 1 dj dZ, 



^/e (40) 



This equation is readily evaluated during a simulation because all the terms depend only on known quantities such 
as £ and Z^. 

When many coordinates are constrained, the derivative along reads: 17,18 



^. Comparing the two procedures 



At first glance equations (23) and (40) seem very different and it is not obvious that they both refer to the same 
quantity. The connection between these formulas is more clearly seen if one rewrites their numerator: 



dA 
dA 



oc ( Z, 



1/2 



1/2 



2 Pq ^ Pq 



1 t dA~ l 

2 Pq Q£ Pq 



ay 
ay 




d 



2/3Z* 



'i=3N 

E 



ag az c 

dx', dx[ 




(23') 
(40') 



Then, inserting equation (39) into (40') leads to (23'). 



III. PRACTICAL CONSIDERATIONS 



The exact formulas for evaluating the free energy derivatives have been recalled in the previous section. However, 
actually using them in a molecular dynamic simulation requires to tackle two problems. The first one is related to the 
chemical system under study: one has to choose the coordinates that belong to the active space, so that all the relevant 
coordinates are considered. The second problem is more technical: the previous formulas look quite complicated to 
evaluate and one might wonder how to compute them efficiently We will first propose a way to decide whether a 
coordinate should be included into the active space. Then, we will show that in many cases, the previous formulas 
can be greatly simplified. 



A. Monitoring the active space 



The first step of a molecular dynamic simulation aiming at calculating the change in the free energy along a 
minimum energy path is to divide the degrees of freedom into active and inactive coordinates. Of course, it is possible 
to consider all coordinates active and to calculate the complete set of derivatives using the previous formulas. However, 
that would require to launch essentially one MD simulation for each degree of freedom: such a procedure would be 
quite expensive. More, in contrast to the reaction path on the potential energy surface, not all degrees of freedom 
are required: many degrees of freedom will move in a nearly harmonic well. As a consequence, the derivative of the 
free energy along these modes is small, and one can safely discard them. Such a case is observed in the forthcoming 
application for the CH distances of the ethylene molecule: the CH bond length slightly increases as the cyclopropane 
is formed, but at each point of the reaction path, they move in a quasi-harmonic well. Hence, their contribution to 
the direction of the minimum free energy path can be neglected. 

Therefore, one must select only a restricted set of coordinates. However, for complex systems undergoing a reaction, 
chemical intuition might not be sufficient. We propose here a way to construct the active and the inactive sets, and 
to monitor this separation along the construction of the path. The value of the derivative of the free energy will be 
used as a quantitative criterion to discriminate between active and non-active coordinates. Ideally, this derivative 
should be zero for an inactive coordinate. However in practice, the actual reaction coordinate has non zero component 
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on all coordinates, including inactive coordinates. The criterion will then be to compare the derivative of the free 
energy along an inactive coordinate to a given threshold. If it is bigger than this threshold then one must consider 
incorporating q n into the active set. More, we will show that one can use the data of a simulation to estimate the 
derivative of the free energy along an unconstrained coordinate. 

Let us consider a system for which the beginning of the minimum free energy path has already been constructed as 
a set of k points . . . , . The purpose of this section if to detail the procedure to find the next point, k + 1, of 
the path. By definition of the reaction path, this point can be obtained by following the gradient of the free energy 
along a small distance ds: 

The previous formulas can be applied to the data of the MD simulation conducted at point £ k to calculate the 
derivatives of the free energy along the active coordinates: < I . Before constructing the point k + 1, we 

I ^ l J i—l....,r 

should update the active and inactive sets: if the derivative of the free energy along an active coordinate is zero, then 
this coordinate should be taken out of the active set. The reciprocal question is then: shall we include any of the 
inactive variable into the active set in order to better describe the rest of the path ? Let us consider the coordinate 
q n as an example. We will now give the expression of the derivative of the free energy along this inactive coordinate q n . 
The previous formulas cannot be used directly because they necessitate that the coordinate under study is constrained 
during the simulation. Let us denote by q\ the particular value of q n at which we want to calculate the derivate of 
the free energy along q n : J^-) k In order to evaluate this quantity, one can follow the same procedure as before, 
and finds: 



(42) 



dQn ) qn=qU {\Z i \-^8{q n -ql)) i 

In this expression, we have explicitly included the delta function S(q n — qfy which ensures that the average corre- 
sponds to a conditional sampling of the phase space in which q n is equal to g„. 

If we want to avoid calculating the full set of generalized coordinates, we have to derive an expression similar to 
the equation (40) for the coordinate q n . An important point here is that, in contrast to £, q n was not constrained 
during the simulation. Therefore, the simulation data contain the sampling over p qn that would have been missing in 
a simulation where both £ and q n would have been constrained. 

The expression for the derivative of the free energy along this unconstrained coordinate q n will be done in two steps. 
In the first step, we will establish the expression for the derivative of the free energy along an unconstrained reaction 
coordinate, that is during a simulation without the constraint £ = 0. Then, we will use the Blue Moon relation to 
obtain the expression of the free energy along a non active, unconstrained, coordinate q n during a simulation with a 
constrained reaction coordinate £. 



First, using equation (6), we have: 



, - A" 1 f P A - { Xc ? Pq + Y ^ (43) 



Using this equation and eq. (A3), we can rewrite p^: 

p £ = -i^Yfpq + Z-^ = BfA^pq + Z^i (44) 



Remembering — — = (pf)- one finds: 

6 d£ dt u u ' 



(3 dp q dqdp^e-^ H j t (BlA-ip,) + j dp q dqdp ( e-P H j t (zfi 



dQ 
5£ 

The ergodicity principle allows us to say that the first integral is zero. Therefore, we have: 



(45) 



^=PJ dp q dqdp 5 e-^l (Z-H 

= P J dpgdqdpte^ 13 " (zfi-Zfi 



dZ f 
~dT 



(46) 
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We shall now pursue the derivation following a procedure similar to that of Darve et a/. 14 ' 18 
The second term of equation (46) reads: 



dp q dqdp i e^ H Zf^ = I dp q dqdp 6 e- pa Z^^=j-p 



■dZ e 



dt 



■ dZ f 



dx' 



- J d Pq dqdpse-t 3H Z^ 2 ^ (j' e l p« + J', 'p q 
We now introduce a new set of generalized momenta (p q ,p£) defined by: 

f Piq = Pq 



p i = Z- 1 i=p ( :+Z^Y\v ti 



(47) 



(48) 



This transformation is valid as it is invertible. Moreover, the Jacobian | J | associated with it is equal to 1. With 
these new variables, equation (15) reads: 



1 , _i l~t _i~ 1~ ~ 

-p q£ A g5 Pq? = -p q A, Pq + -p e Z&i 



(49) 



Equation (47) then reads 
J d p q dqdp^ H Z-H^ 



J dqe~P V J dp q e-i f3p « A « lp '> 



J dp, | J | e-k^zmz-ip^ [j V V e + (j- 1 _ z^Yl) p^] (50) 



As the Hamiltonian is even in p,, after integration over p,, only even terms remain: 



dqdp q dp, | J | e-Wz^pf^-J't 1 



Making use of eq. (B2) to integrate over p^, it comes: 



dqdp q dp^e @ H Z^ 2 £,—r^~ = / dqdqdp^ | J | e 



idZe 



dt 



-W. 



(iZ 2 dx' « 



dqdp q dp^e 



(3Z 2 dx' dx' 



Collecting the previous equations leads to: 



dA 



1 



= q I dp q dqd Pi e 1 



^ (3Z 2 dx' dx' 



(51) 

(52) 
(53) 

(54) 



This equation is the same as the one obtained by Darve et al. (eq. (24) of ref. 18), but we arrived at it with 
different assumptions. 

We now apply this equation to the case of a simulation where £ is constrained but q„ is not. Applying the Blue 
Moon relation (see eq. 18), one finds: 14 ' 18 



dA(q n ,Q 
dq n 



ry 1-1/2 X I 7-1- I ^ J T Vq n UZj qn 



'i=3N 



<I^|- 1/2 <U, 



dq n dZ q 
dx\ dx\ 



(55) 



where S qn stands for S(q n — and Z qn is the part of the inverse mass matrix corresponding to q n : Z qn = 
Si ^"ffrffr- ^ s a l rea dy noted, 18 this expression is slightly different from equation (40), despite the fact that 
they both relate the derivative of the free energy along q n to the force acting on q n during the MD simulation. 
The origin of this difference comes from the nature of the sampling used to estimate the two expressions: equation 
(55) corresponds to a conditional average performed during a simulation in which q n was not constrained whereas 
equation (40) corresponds to a simulation in which both £ and q n were constrained. It is worth stressing here that 
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both expressions are valid but correspond to different contexts. More, as long as the sampling along q n is sufficient, 
evaluating the derivative of the free energy along q n during a simulation in which only £ is constrained using the 
conditional averaging of equation (55) will lead to the same numerical result as equation (40) using a MD simulation 
with both £ and q n constrained. 

Practical use of this equation is described in Appendix D. 

To conclude this section, we detail one possible use of equations (40) and (55) for the construction of a reaction 
path on the free energy surface. We suppose as before that the active set is known and denoted by £, and that the 
path is partially constructed, up to the point k corresponding to the value £ fc = } r of the active coordinates. 

To find the next point £ fe "'" 1 , one should: 

1. Launch a simulation while constraining each active coordinates & (i = 1, . . . , r) to be constant, and equal to 

2. Use formula (40) (or 23) to compute the derivative of the free energy along the active coordinates: 

3. taken out of the active set the coordinates corresponding to zco derivatives, 

4. Then, use the same simulation data and equation (55) to evaluate the derivative of the free energy along the 
inactive coordinates -g^, then include into the active set the coordinates associated to non-zero derivatives, 

5. Recollect all derivatives to obtain the full gradient: gradA = , ^ j • 

6. Construct the next point of the path by following the gradient of the free energy along a small distance ds: 
Such a procedure has been used to study the addition of CC1 2 to ethylene, and is explained in the section IV. 



B. Special types of constraints 

The preceding equations (23, 24, 42 and 55) have been derived without considering the actual form of the constraints. 
Therefore, they are general but they look quite difficult to compute efficiently. Despite the fact that a general 
procedure has already been given by Darve et aZ., 14 , we would like to show here that many common constraints lead 
to considerable simplifications of the previous expressions. Four cases are described here in which is a constant. 
In those cases, the previous equations become: 

F)A 

^ = <*e>*. (56) 

dA (Sq n {-Z qn q n ))f, 
1. Bond distance 

If the reaction coordinate is chosen to be the bond distance between atoms i and j, we have: 19 

£ = dij = yj (Xi - X-j) 2 + ('X l+1 - X j+1 ) 2 + (x i+2 - X j+2 



1 1 

to,- rrii 



2 

(58) 



Care must be taken when constraining many bond distances. In this case, might no longer be a constant, and the 
above simplifications should not be applied blindly. Let us consider a simulation with two constrained bond distances. 
Two cases can be met depending on whether these two bonds share a common atom or not. 

We consider first the case where the two bonds do not share a common atom: for example, bonds between atoms 
i and j and between k and /. The matrix is: 

' - I) < 59 > 
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with, Z dij = 1 /m i + 1 /m j and Zd kl — 1 /m k + 1 jm l . As a consequence, | | is equal to | Z dij \ \ Z dkl | and is a constant: 
one can use simplified equations (56) and (57). 

Let us consider now the case where the constrained bonds share a common atom: for example bonds between atoms 
i and j and between atoms j and k. We also denote by a the angle between the three atoms: 




The matrix reads: 



Z d , 



Z d jk 



(60) 



i 


— dij 


dki 




1 


1 








m. 


rrij 



As a consequence, | Z^ | is no longer a constant and equations (41) and (55) must be used. 

Another commonly used reaction coordinate is the difference of two distances. Once again, one has to consider the 
case where the two distances share a common atom or not. When the two bonds are not sharing any atom, i.e when 
they are involving atoms i, j, k and we have: 

= d^ — dki 

J- + J- (61) 

m k mi 

On the other hand, when the two distances share a common atom j, the previous equations become: 

£ dij djfc 

112, , (62) 

Zf = 1 1 (1-cosa) 

mi m k rrij 

So that Z^ is not a constant and the correction terms should be evaluated explicitly. 

2. Generalized distance 

Another quite common reaction coordinate is the mass-weighted distance between one reference geometry and the 
current geometry of the system. 19,22 We note {x/i] i = 1, . . . , 3A^} the cartesian coordinates of the reference geometry. 
We have: 

£ = / > rndxi — Vi) 2 

(63) 



3. Bond angle 

We now consider constraining the angle a between the atoms i, j and k, that is the angle between the bonds ij 
and jk. We denote by d^, dj k and d^ the distances between these atoms. With these notations, Z^ Reads: 23 

Z i = Z a = —L- + — L- + 5, 2 (64) 
rrndfj m ul-,, mjdfjd^ 

Therefore, Z^ is not a constant and one must use the complete formulas. However, if the bond distances dij and 

djk are also constrained, the formulas can be simplified. When considering a simulation with a, dij and djk all 
constrained, Z^ reads 23 

V _ sin a _ sin ; 

a rrij dj k rri j dij 

Zsin ol ry cos ol (rr\ 

e = I ^^jdj: Zd *3 ~ (° 5 ) 

sin a cos a y 

rrij da rrij ^jk 
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Developing | Zj |, one finds: 



Z ? \=Z a [ -J— + -!— + — L_ ) ( 66) 
1 rriimk mirrij rrijirik 



Thus, the derivatives | | of | | are not zero but | Z^ | is constant during a simulation. Using this fact, 
equation (41) reads: 



OA 


- 4 ( <A&> + 


fc(T) 


Wk 


2Z a 




j—r 


(i=3N 


with Bk 




E 









S fc J (67) 
d& dZ a } 



(68) 



As a, c?ij and dj^ are all constrained, Z a and thus are easily computed during (or after) the simulation. We 
conclude the discussion of this case by noting that despite the fact that one should take the corrective terms -j^-B^. 
into account, they are of the order of magnitude of some tenth of kT and thus one might wonder if they arc negligible 
or not. This point will be discussed in greater details in the next section. 



4- Linear Constraint 

Linear constraint can be written as £ = a^Xi and thus, Z$ = m~ 1 af is a constant. 

This type of constraint has already been used in order to calculate the free energy change along the instrinsic 
reaction path constructed on the PES. 24 

These constraints are of considerable practical importance. First, in a simulation, it is quite common to constrain 
also the global rotation and the global translation of the molecule. Equations similar to the previous equations (23) 
and (42) can be derived by replacing | Z^ | by | Z^t.r |- By construction, the global rotation and translation are 
orthogonal to all internal coordinates. Therefore, we have =| Zc || Zt.r |. 

We show in appendix C that constraining the overall rotation and translation corresponds to applying six additional 
linear constraints on the system. 

As a consequence, the term | Zt.r | is a constant that can be ignored when calculating the derivatives of the free 
energy, leading to the following formulas: 

(69) 

Second, let us consider a simulation with many active coordinates, all described by linear constraints (which may 
include the translation and rotation constraints). Let us denote by i = 1, . . . , r} the r linear constraints applied 
during the simulation. We have 

j=3N 

= c ij x j F° r i m l) ■••)'" (71) 

3=1 

The Lagrangian associated with this simulation is: 

£* = lx*Mx-nx) ^ 

i=l 

The equations of motion are then: 

d£' 

mj£j = ~d^ + ^ Xi dx L For j in 1,...,3A (73) 

3 i — i 
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Demanding that £j = for all i in 1, . . . , r leads to a set of coupled linear equations: 

3N „ T/ k=r j=3N 

E—ah+E^E^ 1 -" F„ riinl ,..., r ,74, 

j=i 7 J fe=i j=i J 

To find the forces acting on the constraints, i.e. to find the values of all Afc, we have to solve this linear system. We 
define: 

Di = V —^-dj For i in 1, ... ,r (75) 

' m 7 - as, 

3 

Using the definition of Z^, it is easily seen that the term Ylj=i C ^ C .' J corresponds to the element [Z^] ifc . In matrix 
notation, equation (74) reads 

Z^A = D (76) 

which is easily solved by inverting the matrix Z^ which depends only on the definition of the linear constraints 
£i. This last expression can be further simplified when the constraints are expressed in the mass- weighted cartesian 
coordinates frame: 

j=3N 

6 = V Fori in l,...,r (77) 

z * /Til ■ J 



, /m, 
i=l v 

Requesting that the linear constraints form an orthonormal set in the mass weighted basis leads to: 

j=3N 

&|&>= £ ^ = S ik (78) 

Therefore, in the case of orthogonal constraints, the inverse of the mass matrix reduces to the identity matrix and 
the previous equations (75-76) become 

3N 1 dV 



m 7 - dx n 
j =1 j j 



More, equation (41), that should be used to calculate the free energy derivatives in the case of multiple active 
coordinates, reduces to its simplest form, similar to (56): 

dA 



k 



(A^) r For k in l,...,r (80) 



This illustrates one convenient property of orthogonal linear constraints: they are all decoupled which leads to 
considerable simplification for the calculation of the free energy derivatives. 

However, the most interesting aspect of these constraints appears when one consider that the full set of generalized 
coordinates consists of linear coordinates: 

j=3N 

& = Y -5M For i in 1, . . . , r (81) 



3=1 v 3 
3=3JV 

qk = V -^-x'i For k in r + 1, . . . ,3iV (82) 



i=i v J 

Using the previous equations, one finds that the force acting on an inactive coordinate qk during the MD simulation 
is 

j=3N gv 

F i^H— d^.^ For k in r + 1, . . . , 3iV (83) 

j=i J 3 

Comparing equations (79) and (83) shows that the effect of the Lagrange multiplier A; is to exactly compensate 
the force acting on the active coordinate £j during the MD simulation, thus ensuring that it remains constant. 

These last equations show the advantage of linear constraints over other constraints: one can estimate the gradient 
of the free energy for the full set of generalized coordinates by analyzing the data of one MD simulation. The only 
condition is to have sufficient sampling along non active coordinates. Therefore, one can launch a simulation with a 
small active set, ideally comprising only the reaction coordinate, without loosing any information. 
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IV. APPLICATION 



It is worth noticing that once we have an expression from which to calculate the free energy derivatives, we can 
apply the same algorithms as those used in quantum chemistry in connection with potential energy gradients. For 
example, it is possible to optimize a structure directly on the free energy surface in the subset of the active coordinates. 
It is further possible to find a transition state along a path, to calculate the Hessian by finite difference, and thus 
to characterize the structures anywhere on the path. This will be applied in this section to the addition of CCL to 



ethylene: CC1 2 + H 2 C=CH 2 



ci. 



We will first optimize the structure of the transition state and the product 



at 300K, and compare the geometrical parameters to those of the OK geometries. We will then focus on constructing 
the minimum free energy path at OK and 300K. 

This reaction has already theoretically been studied in our group, 25 as well as in other groups. 26 ' 27 In particular, 
possible deficiencies of the DFT methods to describe the long range interactions have already been stressed. However, 
our goal here is to construct the reaction path on the free energy surface for this reaction and to compare it with 
the path obtained with a predefined reaction coordinate. In such a comparison, the accuracy of DFT is not the main 
issue. 

Previous studies have shown that the reaction proceeds in two steps: the first phase corresponds to the electrophilic 
addition of the carbene to one of the carbons making up the double bond. This phase proceeds through a transition 
state that has been optimized. The final phase is a nucleophilic attack on the second carbon of the double bond to 
close the cycle. This phase proceeds without any barrier, directly after the first one: 



As already noticed, 25 the distance between the center of the double bond and the carbon of the carbene is a good 
reaction coordinate for the first phase, but is not sufficient to accurately describe the second phase. Therefore, we 
have decided to include three coordinates into the active set: dec'- the ethylene CC bond distance, dca'- the distance 
between the carbon atom of the carbene and the center of the double bond denoted by G, and a the angle between 
the double bond and this last distance. This is depicted on the following scheme: 



C 3 



o 
y 

T3 



Gi 



...dec 



C 2 CI 5 



For further reference, the previous study using only the dec distance as a reaction coordinate will be referred to 
as the ID study by opposition to the present study that uses a 3 coordinates active set and will thus be referred to 
as the 3D scheme. 

The matrix corresponding to the constraints dec dec and a reads: 





( Zd CG 













Zdcc 







V o 








This form the the Zj matrix is quite different from that obtained in cq. (65) for a system of two bonds sharing a 
common atom. This comes from the fact that the elements of this matrix are defined with respect to the coordinates 
of the atoms C 1 , C 2 and C 3 whereas a and dec are defined with respect to C 1 , C 3 and G. As G is the center of mass 
of C 1 C 3 , it plays a symmetric role in the expressions of the elements of the Zj matrix, leading to compensating terms 
which sum up to zero. As an example, let us consider the off diagonal term Zd cc d C G between the double bond C 1 C 3 
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and the distance GC 2 . Simple algebra gives 

ddcc _ ddcc 
ddcc 



(85) 



= (86) 

(87) 



dx 2 

1 ddcc 1 ddcG 



mi <9xi TO3 3x3 
which leads to 

1 drf C c drf C G _ dd cc ( 1 d^cc 1 dd cc \ _ 
iccfe "^m, dx; cbq _ 9 Xl I mi dx t m 3 9x 3 J 1 ' 

Similar cancellations appear for the other non diagonal terms. 

We will now give the expression of Zd CG , Zd cc and Z a . Even though in our case all three atoms have the same 
mass, we will write the following formula for the general case where all three atoms have different masses and G is 
the center of mass of C 1 C 3 . Tedious but straightforward algebra leads to: 

miXi + m 3 x 3 

x G = ■ (89) 

Tii + rnz 

Z dcG = — + 1 (90) 

7712 mi + m 3 

Zdcc = — + — (91) 
mi 7B3 

ry _ ^dca 1 ^dcc fc\o\ 

Z « - -72 1" -j2 l^j 

a CG U CG 



I — Z a Zd GC Zdcc (93) 



Finally, derivatives of the free energy read: 



dA 1 f/X \ k<T> Z dGC \ 

ddc~ G = Q [ {XdcG) z—^) (94) 

^ = L f ( . ) _k<T>ZdCG\ 



ddc^-Q \ {Xdcc) z a d%- c ) (95) 



dA 
da 



(A dcG ) (96) 



A. Stationary points 

1. Transition State 

Optimization of the transition state structure was carried out by employing the quasi-Newton scheme 28 using the 
formula proposed by Bofill to update the Hessian. 29 This procedure will be described in details elsewhere. 30 This 
quasi-Newton scheme is an iterative procedure that requires initial values of the three parameters dcc^ dec and a. 
These values were taken from the previous ID study, 25 in which the transition state (TS) was located at dec = 4.5 
ao: this was taken as the initial value of dec for our 3D optimization. The thermal average of the ethylene bond 
length and of the angle during the previous ID simulation with dec = 4.5 ao were taken as initial values for dec 
and for the angle a. The Hessian at this initial geometry was calculated employing finite differences of the gradients. 
Diagonalization lead to only one negative eigenvalue proving the transition state nature of the starting point. The 
Hessian matrix was also calculated and diagonalized for the final geometry: we found only one negative eigenvalue, 
corresponding to an eigenvector directed mainly along the dec variable. 

The resulting geometry is reported on Figure 1, together with the geometry obtained at 300K when only dec IS 
constrained, as well as the geometry obtained on the PES at OK. The main geometrical parameters arc given in Table 
I. 
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/ 2.405 / 2 .405 /2.487 



^©§r©^ ^€>Sr^© 

300 K - ID 300 K - 3D OK 

FIG. 1: Transition states geometries for the addition of dichlorocarbene CC1 2 to ethylene, optimized at OK, at 300K with only 
the dcG distance constrained (ID), and at 300K with dec dec and a all constrained (3D). G is the midpoint of the double 
bond. Distances in A, angles in deg. 





dca (A) 


dec (A) 


a (deg.) 


ID 


2.405(2) (a) 


1.364(2) (b) 


115(1) (6) 


3D 


2.405(2) 


1.362(2) 


116(1) 


OK 


2.487 


1.356 


116.5 



TABLE I: Main geometrical parameters for the transition state geometries for the addition of the dichlorocarbene CC1 2 on 
ethylene, optimized at OK, at 300K with one constraint (ID), and at 300K with three constraints (3D), (a) Estimate of the 
uncertainty. The uncertainty for structures on the potential energy surface is much smaller, and thus not quoted, (b) Average 
values. 

All three structures are non symmetric: the a angle is approximately equal to 115° instead of 90°. This is in 
agreement with the Woodward-Hoffman rules and with previous calculations 27, 31-34 and experimental results. 34 More, 
all geometries correspond to an early transition state in which the ethylene molecule is only slightly distorted. The 
double bond length equals approximately 1.36 A, close to the equilibrium distance in the free molecule: dec — 1-33 

A. 

The two structures found at 300K are almost identical. This is not surprising: as already stated the distance dcG 
between the middle of the double bond and the carbon atom of the carbene is a good reaction coordinate up to this 
point. 

The transition state geometry at OK is in fairly good agreement with previous ab initio calculations 27 ' 31-33 except 
for the dcG distance which is slightly overestimated here: dca — 2.487 A compared to dcG = 2.37 A at the B3LYP/6- 
31G* level, 33 or d CG = 2.38 A at the MP2/6-31G* level. 27 - 32 ' 33 This is due to the fact that this region of the potential 
energy surface is shallow so that the location of the transition state depends strongly on the functional and on the 
basis set used. Similarly, the a angle is also a bit overestimated: it equals 116.5° in our study, whereas it equals 
respectively 111.7° and 112.2° at the MP2/6-31G* and B3LYP/6-31G* levels of calculation. 27 ' 32 ' 33 

Going from the transition state structure found at OK to that obtained at 300K leads to an increase of the double 
bond length due to thermal vibrations. On the other hand, the dec distance is smaller at hight temperature. This 
comes from the fact that the barrier originates mainly from rotational entropy lost when the transition state is 
formed. As the average rotational momentum increases with the temperature, we expect this barrier to move to 
smaller distance as the temperature rises. This is in agreement with previous studies. ' 

2. 1 , 1 '-dichlorocyclopropane 

The main geometrical parameters for the 1,1 '-dichlorocyclopropane are reported in Table II and the corresponding 
structures are given in Figure 2. All three structures are very symmetric: a = 90° which means that the carbon atoms 
form an isocel triangle, its base being the former ethylenic bond. All three carbon-carbon bond length are now close 
to that of a single bond: d c i C 3 = 1.53 A, dcic 2 = ^c 3 c 2 = 1-50 A. 

Similarly to what was observed for the transition state structures, there are very little difference between the room 
temperature ID and 3D structures. This comes from the fact that the gradient is zero for a minimum, so that 
the definition of the reaction coordinate does not matter anymore. To further assess this point, we have launched a 
simulation with no constraints. The average values of dec dec and a are in very good agreement with the constrained 
simulations: d CG = 1-291 ± 0.002 A, d cc = 1.533 ± 0.002 A and a = 90. ± 1°. 

The OK structure is in good agreement with previous calculations: the difference in distances and angles is less 
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300 K - ID 300 K - 3D OK 

FIG. 2: Geometries of the l,l'-dichloro cyclopropane optimized at OK, at 300K with only the dee distance constrained (ID), 
and at 300K with dec, dec and a all three constrained (3D). G is the midpoint of the double bond. Distances in A, angles in 
deg. 





dec (A) 


dec (A) 


a (deg.) 


ID 


1.289(2) (f,) 


1.535(2)^5 


90(1) (6) 


3D 


1.289(2) 


1.532(2) 


90(1) 


OK 


1.288 


1.527 


90.1 



TABLE II: Main geometrical parameters for the l,l'-dichlorocyclopropane, optimized at OK, at 300K with one constraint (ID), 
and with three constraints (3D), (a) Estimate of the uncertainty, (b) Average values. 

then 0.01 A, and 1°, respectively. 31 When going from OK to 300K, the C-C bonds elongate, while dcG and a remain 
constant. This comes from the fact that this molecule has a C 2v symmetry which imposes that the average angle 
should be close to its value on the potential energy surface. 

B. Reaction Path 

We focus now on the core of this application: the construction of the reaction path connecting the reactants to 
the product, directly on the free energy surface. The aim of this part is dual: first, we show how to use the previous 
formula on a real example. Second, we compare the 3D path constructed here to the path obtained with only one 
'chemically intuitive' reaction coordinate. Starting from the transition state, we have constructed the forward reaction 
path leading to the product and the backward path leading to the reactants. 

In this work, we have moved along the gradient employing a small step size: at each point x k of the path, a 
simulation is launch while constraining all three active coordinates dcd dec an d a. The previous formulas ((94), 

(95) and (96)) are used to compute the free energy gradient: g = ( ddta ' ddec ' fa) ' ^ e ^ en convert this gradient 

into a normalized mass weighted gradient: gMW = ■A/'Zcg, with N = (g*Z^g) -1 / 2 . The next point x k+1 is calculated 
by following the gradient on a small distance ds: 

x k+i = x k _ dg x gMw (Q7) 

An alternative way is to employ the scheme derived by Gonzalez et a/. 35 which allows for the use of a much bigger 
stcpsize. The forward path corresponds to the closure of the cycle, that is to say to the formation of the second 
carbon-carbon bond. The gradient along this path is large, and we used a stepsize of 0.5 a.u. 36 The backward path 
corresponds to the departure of the carbene, which is the reverse of the electrophilic addition. The change in free 
energy is small in that direction, and we had to employ a smaller stcpsize of 0.2 a.u. to minimize the statistical noise. 

1. Free energy profile 

The resulting free energy profile (FEP) is reported in Figure 3 together with the profile generated with one 
constraint. 25 In order to compare them, both paths have been plotted using the dca distance as an approximate 
reaction coordinate. The FEP obtained in the present work is given as an inset in Figure 3. The two profiles are very 
similar: first, the free energy increases smoothly from the isolated reactants to the transition state. Then, it decreases 
abruptly when the cyclopropane is formed. 
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On an energetic ground, in agreement with the fact that the free energy is a state function, both path lead to a 
similar change in free energy equal to: AA corr = -47.6 kcal-mol -1 in this work, and AA corr = -46.5 kcal-mol -1 in 
our previous work, when employing 40000 steps. More, as dcG is a good reaction coordinate for the first phase, the 
barrier is also similar in both cases: AA^f rr w 11.5 kcal-mol -1 in this work, compared to AA™ r sa 11.7 kcal-mol -1 
previously. These values are in good agreement with the previous studies. 25,32 




FIG. 3: Free Energy Profile (FEP) for the addition of dichlorocarbene CC1 2 to ethylene. The ID profile is calculated using 
dec as the predefined reaction coordinate, whereas the 3D profile uses an active set of three coordinates (dec dec, ot). The 
inset shows the FEP plotted against the curvilinear distance from the transition state along the 3D path. G is the midpoint of 
the double bond. 



2. Geometrical parameters 



Even thought the two free energy profiles are similar, the two path are actually quite different in terms of geometrical 
parameters. We discuss here the variations of the structural parameters dec and a for both path. The evolution of 
the double bond distance dec and of the a angle obtained in this study are reported on Figure 4 together with the 
average values (dec) and (a) obtained in the previous study using only dee as a reaction coordinate. 

The first feature worth noting is that the forward path, corresponding to the the formation of the second C-C bond, 
is not parallel to the dee axis but acquires contributions along both dec and a. This confirms that an accurate 
description of this step requires a more complex reaction coordinate than just the dec distance. As a consequence, 
despite the fact that dee varies only from 2.40 A to 1.29 A for the second phase, the actual path length is of the 
same order of magnitude as for the first phase: s w 10 a.u. 

The evolution of the two structural parameters dec and a can be divided into three zones. The first one corresponds 
to dec ^ 2.6 A. In this zone, there is little interaction between the two reactants: dec is constant and equal to 1.33 A, 
which is the standard double bond length, and a tends to 90°. 37 This comes from the fact that, for the unconstrained 
system, at large separation there is free rotation of the carbene around the ethylene molecule. As a consequence, a 
should vary freely between 0° and 180°, with an average value of 90°. 

The second zone corresponds to the electrophilic addition, that is the formation of the first bond between the 
carbene and the ethylene molecule. In this zone, dec evolves from ca. 1.8 A to ca. 2.6 A. As the two molecules start 
to interact the ethylenic bond elongates from 1.33 A to ca. 1.48 A. As expected, this last value is intermediate between 
a single and a double carbon-carbon bond length. Simultaneously, a increases to 128°. This is a consequence of the 
fact that the symmetric approach for which a = 90° is forbidden, and is thus associated with a very high barrier. 34 
During this phase, the C X C 2 bond is formed: this distance decreases from 2.4 A to 1.56 A, which is close to a single 
bond. 

The last zone corresponds to dec < 1.8 A and represents the closure of the cyclopropane ring. The C X C 3 bond 
(formerly the double bond) length increases to its finals value of 1.538 A, while a drops to 90°. Comparing the path 
constructed using a 3 coordinates active space to the path previously obtained shows that a is a much better reaction 
coordinate for this phase than dee- Indeed, during this phase the C^C 2 distance is almost constant and close to 
1.54 A, while the C 3 C 2 distance decreases strongly from 2.3 A to 1.54 A. This illustrates that the second phase of 
the reaction is actually the bending of the C 1 C 2 bond towards the C 3 carbon atom. For this type of movement a 
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1.55 




1.5 2 2.5 3 3.5 

d CG (A) 

FIG. 4: Reaction path for the addition of dichlorocarbene to ethylene. Two projections are show: in the dec — dcG and 
ot — dec subspace, G being the midpoint of the double bond. The ID path is obtained using dcG as the reaction coordinate, 
whereas the 3D path uses an active set of three coordinates (dec dec, a). 



is a good reaction coordinate whereas dec is a poor one. As a consequence, when only dec is used as a reaction 
coordinate, a drops abruptly from 115° to 90° around dec = 1-36 A. On the other hand, the variations of a are much 
smoother with our active set. 

To conclude this section, we would like to point out that, as expected, both path are qualitatively similar, with 
the same overall evolutions of the geometrical parameters. However, they differ significantly on a quantitative basis, 
especially in the third zone. This is due to the fact that dec is not a good approximation to the reaction coordinate 
in this zone. To illustrate how this explains why the two path are different, let us consider a force acting on the dec 
variable at the point A (Fig. 4) located in this zone. We denote the local coordinate set by (£,u,u), with £ being the 
reaction coordinate. From the definition of a reaction coordinate, we have: 



dA _ 

dA 
du 

ov 



g ? (98) 
= (99) 
^0 (100) 



As dec is not a good approximation to the reaction coordinate, £ depends not only on dec but also on dec and 
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a. As a result, the derivative of the free energy along dec is related to the gradient of the free energy: 



dA 



dA d£ dA du dA dv 

2 1 1 

d£ ddec du ddec dv ddec 



ddec 




(101) 



Thus, if dec is not constrained, the system will evolve in order to minimize the free energy in that direction, and 
the two paths will not coincide. 



It is worth analyzing the importance of the B terms in equations (94)-(96). They are reported on Figure 5 together 
with the average value of the Lagrange multipliers. It appears that those terms are usually smaller than the Lagrange 
multiplier by at least an order of magnitude. However, as the reaction path is following the free energy gradient, 
these small differences are accumulated along the path, leading to a non-negligible correction. For example, for the 
studied reaction, the free energy correction along the path can reach 1 kcal-mol . The larger correction for the 
geometrical parameters is observed for the evolution of the double bond length: A (dec) ~ 0-01 A. However, the 
structural differences for the product and the transition state are much smaller: A(d) < 0.005 A and A (a) < 0.5°. 
The correction for the total free energy difference is only A(AA corr ) = 0.1 kcal-mol -1 . 

We would like to draw attention to the fact that, even thought those terms are generally not negligible, one should 
not forget that we are using classical mechanics, and that all quantum effects are missing for the description of the 
nuclei motion. In particular, ZPE and tunneling are completely neglected. 



Following the procedure of Gonzalez et al., 35 we have constructed the reaction path at OK. As the differences 
between the potential energy profile and the free energy profile have already been discussed, 25 we shall focus here on 
the differences between the path at OK and 300K. The two path are reported on figure 6. 

At large separation, that is for the electrophilic addition, dec is a good reaction coordinate at both temperatures. 
More, as long as the two reactants interact only weakly, the thermal motions are mainly vibrations of the two molecules. 
As the potential energy surface is almost harmonic for such vibrations, the evolutions of the double bond distance 
and of the angle a are similar at 300K and at OK. On the potential energy surface, as the distance between the two 
fragments increases, the interaction depends less and less on the angle, leading to some spurious evolutions. On the 
free energy surface, this independence of the interaction on the angle leads to an average value of 90° instead. 

When the interaction starts to be stronger, these oscillations of the angle disappear and both path are qualitatively 
similar. However, due to the vibrational thermal energy, at 300K the parameters changes are larger. 



The Car-Parrinello projector augmented-wave (CP-PAW) 38 ' 39 program by Blochl was used for all AIMD calcula- 
tions. In the CP-PAW calculations, periodic boundary conditions were used in all examples with an orthorhombic 
unit cell described by the lattice vectors ([0, 14.74, 14.74], [14.74, 0, 14.74], [14.74, 14.74, 0]) (bohr, 7.8 A). The energy 
cutoff used to define the basis set was 30 Ry (15 a.u.) in all cases. Because the systems of interest are all isolated 
molecules, only the T-point in k-space was included and the interaction between images was removed by the method 
proposed by Blochl. 39 The approximate density- functional theory (DFT) used here consisted of the combination of the 
Perdew-Wang parametrization of the electron gas 40 in combination with the exchange gradient correction presented 
by Becke 41 and the correlation correction of Perdew. 42 The SHAKE algorithm 20 was employed in order to impose the 
constraints. The mass of the hydrogen atoms was taken to be that of deuterium, and normal masses were taken for 
all other elements. 

Room temperature CP-PAW calculations were performed at a target temperature of 300 K. The Andersen 
thermostat 43 was applied to the nuclear motion by reassigning the velocity of N randomly chosen nuclei every n 
steps where N and n are chosen to maintain the desired temperature. In our case this amounted to one velocity 
reassignment every 20 steps. Thermostat settings were monitored and adjusted if necessary during the equilibration 
stage, with the main criteria for adequate thermostating being the mean temperature lying within a range of 300 ± 10 



C. Importance of the correction to evaluate the gradients 



D. Effect of the temperature 



E. Computational details 
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d CG (A) 

FIG. 5: Importance of the correction terms in the evaluation of the free energy derivative. (Xx) is the average of the Lagrange 
multiplier. Bcc = — kT d S g and Bcc ~ —kT d S c are the correction to the derivative of the free energy along dcG and 

Q CG Zad CC 

dec respectively. G is the midpoint of the double bond. 

K and a temperature drift lower than 1 K/ps. In combination with the Andersen thermostat, a constant friction 
was applied to the wave function with a value of 0.001. Following the conclusions of the previous study, for each 
simulation, we performed between 35000 and 50000 steps in order to ensure that the system was fully equilibrated 
and that the temperature and the free energy gradient were fully converged. 

The free energy profiles were obtained by numerical integration of the gradient along the path, using a procedure 
similar to the pointwise thermodynamic integration (PTI) method. As the overall rotation and translation of the 
molecule are frozen during a simulation, one has to correct the free energy obtained from a simulation. We have used 
the procedure of Kelly et al. 25 To summarize, the overall correction for the entropy is the sum of the translational 
and rotational entropy: 

AS£ B r (s) = S£ B (s) + S$ B (s) - S£(oo) - S|(oc) (102) 

Where S^ B (s) is the rotational entropy at RC = s which is geometry dependent, and S^ B (s) is the translational 
entropy at RC =s. The last two terms represent the translational entropy of the isolated species A and B. These terms 
are calculated using standard formula for the partition functions. Finally, the total free energy change AA^ r (s) is 
obtained from a CP-PAW simulation with the constraints described above as 

AA% B M {s) = AA^ w (s) - TAS£ B rr (s) 

where AAp B ^ w (s) is the change in free energy obtained directly from the simulation, and CM (classical mechanics) 
refers to the fact that the motion of the nuclei is described using classical mechanics. It should be mentioned that the 
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FIG. 6: Reaction path for the addition of the dichlorocarbene to ethylene at OK and 300K. Two projections are show: in the 
dec — dcG and a — dco subspace, G being the midpoint of the double bond. 

zero-point energy (ZPE) correction is not included in our simulations. This should not seriously hamper our objective 
which is to analyze the qualitative differences between the path obtained with one and three constraints. 

V. CONCLUSION 

In this work, we have proposed a new look at the standard formulas for evaluating the derivatives of the free energy 
along a reaction coordinate. 

First, we recollected the different formulas available in a uniform approach. These formulas allow one to compute 
accurately and efficiently the gradient of the free energy for an unconstrained system using a constrained molecular 
dynamic simulation. 

The main finding of this investigation is a set of equations that makes it possible to construct a minimum free 
energy reaction path instead of calculating the free energy changes along a predefined path. Indeed, we believe that 
one can use the free energy gradients in the same way potential energy gradients have been used in the past 20 years 
in quantum chemistry calculations. 

Addition of the dichlorocarbene to ethylene was studied as a numerical example. It was shown that a simulation 
using only one constraint is not sufficient to describe the whole path. Using the free energy gradient in a subset of 
three active coordinates lead to a smoother path, refining the understanding of this process. 
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APPENDIX A: PROPERTIES OF THE MASS MATRIX 

Using A g £ = I, one readily finds: 

X,A,+Y € B^ =I 3 n-i (Al) 

X,B 4 +Y 4 Q = (A2) 

Y|A 9 + Z 6 B\ = (A3) 

Y|B 4 + Z £ C e = 1 (A4) 

Plugging equation (A3) into equation (Al) leads to 

A" 1 = X, - Y^Yl (A5) 

The last relation we need derives from: 

(^)-^(b ; :W'tI;) 

Equating the determinant of the first and last term, we get: 

lA^HAJI^rH J*MJ|=|J| 2 |M| (A7) 

APPENDIX B: PROPERTIES OF GAUSSIAN INTEGRALS 

We recall here the main properties of Gaussian integrals: 

due(~ u ' Au ) oc| A \~ 1 / 2 (Bl) 
J due(- utAu )u 4 Bu oc ^Tr (A^B) / due(- u * Au ) (B2) 

APPENDIX C: CONSTRAINING THE OVERALL ROTATION AND TRANSLATION 

In this section, we derive the expressions used to constrain the overall translation and rotation of the system. For 
this, we start from a reference geometry x° and we seek the conditions that should be satisfied by the new geometry 
x. The Center of Mass G is defined by: 

X G = 



i=l 



III; 



where m, is the mass of the atom i with coordinate Xi. In the following, the notation Xj = xi — xg will be used. Let 
us denote by x, y and z the unit vectors of our laboratory coordinate system. For the sake of clarity, the component 
of Xi along the x, y and z axis will be denoted by x$ i, Xi^ and Xi } 3 respectively. 
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Constraining the translation is equivalent to freeze the movement of the center of mass, i.e. to apply the following 
linear constraints: 



N 



CTi = E ^ K 1 - 40 = 

i=l 
N 



cr-2 = 



i=l 
N 



i=l 



(Cla) 
(Clb) 
(Clc) 



Constraining the rotation can be done by using the second Eckart conditions. 44 These conditions minimize the 
angular momentum due to small displacements: they provide an approximate way to constrain the global rotation 
during a molecular dynamic simulation: 



that is: 



(T4 = x ■ 



&5 = y 



0"6 = Z 



N 



N 



J2 m (x° x x, 

•8=1 

N _ 

5^ rrn (x° x x;) 

i=l 

N 

E m * X **) 



= 



= 



1=1 



i=i 

JV 



i=l 
JV 



°6 = J2 



rrn (x° !Xi,2 - ^,1^,2) = 



i=i 



These last equations can be written as linear constraints: 



N 

E 



AT 



JV 



= ^ W X°2 

i=l 

JV 



2^ M 



JV 



°"5 = E m ' 

i=l 

°"6 = E m * 1 ^ I X M 

i=l 



_/\,f -J' 



JV 

E 



M 



2 - Xi, 2 X it3 



3 ~ s i,3 2%1 



J ~0 



M 



c i4 Xi > 1 I ^,2 



E- 

3=1 

JV 

E 

A' 

E 



J',3 



M 



3 ™0 



M J ' 2 



= 



= 



(C2a) 
(C2b) 
(C2c) 

(C3a) 
(C3b) 
(C3c) 



(C4a) 
(C4b) 
(C4c) 



APPENDIX D: ACTUALLY CALCULATING EQ. 55 

In this appendix, we propose one way to calculate the derivatives of the free energy A along q n , using a simulation 
in which only £ is constrained. As an example, we consider that the coordinates qi to q n were inactive at step k — 1 
and become active at step k. 

The first step is to use the expressions for the generalized coordinates to obtain the values for , Z qn , and 
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The main difficulty in using equation (55) is that one must ensure that the sampling of q n around is sufficient. 
In practice, this imposes to have long MD simulations, with approximately 100000 steps for each considered inactive 
coordinate. One way to circumvent this problem is to use a Taylor expansion of the derivative of the potential around 

j=n l=n ^ v \ 

' «? 

The simulation data is then used to fit the coefficients of this expression. The resulting equation is then plugged 
into equation (55). 
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